AITopics | Omsk Oblast

Thinking LLMs solve complex tasks at the expense of increased compute and overthinking on simpler problems, while non-thinking LLMs are faster and cheaper but underthink on harder reasoning problems. This has led to the development of separate thinking and non-thinking LLM variants, leaving the onus of selecting the optimal model for each query on the end user. We introduce OptimalThinkingBench, a unified benchmark that jointly evaluates overthinking and underthinking in LLMs and also encourages the development of optimally-thinking models that balance performance and efficiency. Our benchmark comprises two sub-benchmarks: OverthinkingBench, featuring simple math and general queries in 72 domains, and UnderthinkingBench, containing 11 challenging reasoning tasks along with harder math problems. Using novel thinking-adjusted accuracy metrics, we extensively evaluate 33 different thinking and non-thinking models and show that no model is able to optimally think on our benchmark. Thinking models often overthink for hundreds of tokens on the simplest user queries without improving performance. In contrast, large non-thinking models underthink, often falling short of much smaller thinking models. We further explore several methods to encourage optimal thinking, but find that these approaches often improve on one sub-benchmark at the expense of the other, highlighting the need for better unified and optimal models in the future.

accuracy, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.13141

Country:

Europe > Russia > Northwestern Federal District > Kaliningrad Oblast > Kaliningrad (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine (1.00)
Media > Music (0.94)
Education (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs

Ghosh, Sayan, Warraich, Shahzaib Saqib, Tarsadiya, Dhruv, Yauney, Gregory, Swayamdipta, Swabha

arXiv.org Artificial IntelligenceOct-7-2025

Language models can be sampled multiple times to access the distribution underlying their responses, but existing methods cannot efficiently synthesize rich epistemic signals across different long-form responses. We introduce Consensus Graphs (ConGrs), a flexible DAG-based data structure that represents shared information, as well as semantic variation in a set of sampled LM responses to the same prompt. We construct ConGrs using a light-weight lexical sequence alignment algorithm from bioinformatics, supplemented by the targeted usage of a secondary LM judge. Further, we design task-dependent decoding methods to synthesize a single, final response from our ConGr data structure. Our experiments show that synthesizing responses from ConGrs improves factual precision on two biography generation tasks by up to 31% over an average response and reduces reliance on LM judges by more than 80% compared to other methods. We also use ConGrs for three refusal-based tasks requiring abstention on unanswerable queries and find that abstention rate is increased by up to 56%. We apply our approach to the MATH and AIME reasoning tasks and find an improvement over self-verification and majority vote baselines by up to 6 points of accuracy. We show that ConGrs provide a flexible method for capturing variation in LM responses and using the epistemic signals provided by response variation to synthesize more effective responses.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.03527

Country:

North America > United States > California (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.88)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Experiment on creating a neural network with weights determined by the potential of a simulated electrostatic field

Polad, Geidarov

arXiv.org Artificial IntelligenceJul-8-2025

This paper explores the possibility of determining the weights and thresholds of a neural network using the potential -- a parameter of an electrostatic field -- without analytical calculations and without applying training algorithms. The work is based on neural network architectures employing metric recognition methods. The electrostatic field is simulated in the Builder C++ environment. In the same environment, a neural network based on metric recognition methods is constructed, with the weights of the first-layer neurons determined by the values of the potentials of the simulated electrostatic field. The effectiveness of the resulting neural network within the simulated system is evaluated using the MNIST test dataset under various initial conditions of the simulated system. The results demonstrated functional viability. The implementation of this approach shows that a neural network can obtain weight values almost instantaneously from the electrostatic field, without the need for analytical computations, lengthy training procedures, or massive training datasets.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.3103/S0147688222050161

2507.02933

Country:

Asia > Russia > Siberian Federal District > Omsk Oblast > Omsk (0.04)
Asia > Russia > Siberian Federal District > Novosibirsk Oblast > Novosibirsk (0.04)
Asia > Azerbaijan > Baku Economic Region > Baku (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Survey of Graph Transformers: Architectures, Theories and Applications

Yuan, Chaohao, Zhao, Kangfei, Kuruoglu, Ercan Engin, Wang, Liang, Xu, Tingyang, Huang, Wenbing, Zhao, Deli, Cheng, Hong, Rong, Yu

arXiv.org Artificial IntelligenceFeb-27-2025

Graph Transformers (GTs) have demonstrated a strong capability in modeling graph structures by addressing the intrinsic limitations of graph neural networks (GNNs), such as over-smoothing and over-squashing. Recent studies have proposed diverse architectures, enhanced explainability, and practical applications for Graph Transformers. In light of these rapid developments, we conduct a comprehensive review of Graph Transformers, covering aspects such as their architectures, theoretical foundations, and applications within this survey. We categorize the architecture of Graph Transformers according to their strategies for processing structural information, including graph tokenization, positional encoding, structure-aware attention and model ensemble. Furthermore, from the theoretical perspective, we examine the expressivity of Graph Transformers in various discussed architectures and contrast them with other advanced graph learning algorithms to discover the connections. Furthermore, we provide a summary of the practical applications where Graph Transformers have been utilized, such as molecule, protein, language, vision, traffic, brain and material data. At the end of this survey, we will discuss the current challenges and prospective directions in Graph Transformers for potential future research.

graph, node, transformer, (14 more...)

arXiv.org Artificial Intelligence

2502.16533

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.27)
Asia > China > Hong Kong (0.04)
Asia > China > Fujian Province > Xiamen (0.04)
(11 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fair Railway Network Design

He, Zixu, Botan, Sirin, Lang, Jérôme, Saffidine, Abdallah, Sikora, Florian, Workman, Silas

arXiv.org Artificial IntelligenceSep-3-2024

When designing a public transportation network in a country, one may want to minimise the sum of travel duration of all inhabitants. This corresponds to a purely utilitarian view and does not involve any fairness consideration, as the resulting network will typically benefit the capital city and/or large central cities while leaving some peripheral cities behind. On the other hand, a more egalitarian view will allow some people to travel between peripheral cities without having to go through a central city. We define a model, propose algorithms for computing solution networks, and report on experiments based on real data.

algorithm, budget, gini index, (9 more...)

arXiv.org Artificial Intelligence

2409.02152

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)
(64 more...)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Rail (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Communications > Networks (0.83)

Add feedback

Regional inflation analysis using social network data

Chsherbakov, Vasilii, Karpov, Ilia

arXiv.org Artificial IntelligenceMar-14-2024

Inflation is one of the most important macroeconomic indicators that have a great impact on the population of any country and region. Inflation is influenced by range of factors, one of which is inflation expectations. Many central banks take this factor into consideration while implementing monetary policy within the inflation targeting regime. Nowadays, a lot of people are active users of the Internet, especially social networks. There is a hypothesis that people search, read, and discuss mainly only those issues that are of particular interest to them. It is logical to assume that the dynamics of prices may also be in the focus of user discussions. So, such discussions could be regarded as an alternative source of more rapid information about inflation expectations. This study is based on unstructured data from Vkontakte social network to analyze upward and downward inflationary trends (on the example of the Omsk region). The sample of more than 8.5 million posts was collected between January 2010 and May 2022. The authors used BERT neural networks to solve the problem. These models demonstrated better results than the benchmarks (e.g., logistic regression, decision tree classifier, etc.). It makes possible to define pro-inflationary and disinflationary types of keywords in different contexts and get their visualization with SHAP method. This analysis provides additional operational information about inflationary processes at the regional level The proposed approach can be scaled for other regions. At the same time the limitation of the work is the time and power costs for the initial training of similar models for all regions of Russia.

inflation expectation, information, retrieved, (15 more...)

arXiv.org Artificial Intelligence

2403.00774

Country:

Asia > Russia > Siberian Federal District > Omsk Oblast > Omsk (0.28)
South America > Argentina (0.04)
North America > United States > Idaho > Latah County > Moscow (0.04)
(4 more...)

Genre: Research Report > New Finding (0.89)

Industry:

Government (1.00)
Banking & Finance > Economy (1.00)
Information Technology > Services (0.91)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GCT-TTE: Graph Convolutional Transformer for Travel Time Estimation

Mashurov, Vladimir, Chopurian, Vaagn, Porvatov, Vadim, Ivanov, Arseny, Semenova, Natalia

arXiv.org Artificial IntelligenceOct-15-2023

This paper introduces a new transformer-based model for the problem of travel time estimation. The key feature of the proposed GCT-TTE architecture is the utilization of different data modalities capturing different properties of an input path. Along with the extensive study regarding the model configuration, we implemented and evaluated a sufficient number of actual baselines for path-aware and path-blind settings. The conducted computational experiments have confirmed the viability of our pipeline, which outperformed state-of-the-art models on both considered datasets. Additionally, GCT-TTE was deployed as a web service accessible for further experiments with user-defined routes.

dataset, dependency, time estimation, (14 more...)

arXiv.org Artificial Intelligence

2306.04324

Country:

Asia > Russia > Siberian Federal District > Republic of Khakassia > Abakan (0.07)
Asia > Russia > Siberian Federal District > Omsk Oblast > Omsk (0.07)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
(2 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (0.96)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Logistics, Graphs, and Transformers: Towards improving Travel Time Estimation

Semenova, Natalia, Porvatov, Vadim, Tishin, Vladislav, Sosedka, Artyom, Zamkovoy, Vladislav

arXiv.org Artificial IntelligenceJul-12-2022

The problem of travel time estimation is widely considered as the fundamental challenge of modern logistics. The complex nature of interconnections between spatial aspects of roads and temporal dynamics of ground transport still preserves an area to experiment with. However, the total volume of currently accumulated data encourages the construction of the learning models which have the perspective to significantly outperform earlier solutions. In order to address the problems of travel time estimation, we propose a new method based on transformer architecture - TransTTE.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2207.05835

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.06)
Asia > Russia > Siberian Federal District > Omsk Oblast > Omsk (0.06)
Asia > Russia > Siberian Federal District > Republic of Khakassia > Abakan (0.05)
North America > United States > New York > New York County > New York City (0.05)

Genre: Research Report (0.40)

Industry: Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

The cyclic job-shop scheduling problem: The new subclass of the job-shop problem and applying the Simulated annealing to solve it

Matrenin, Pavel, Manusov, Vadim

arXiv.org Artificial IntelligenceJun-18-2020

In the paper, the new approach to the scheduling problem are described. The approach deals with the problem of planning the cyclic production and proposes to consider such scheduling problem as the cyclic job-shop problem of the order k, where k is the number of reiterations. It was found out that planning of only one iteration of the loop is less effective than planning of the entire cycle. To the experimental research, a number of test instances of the job-shop scheduling problem by Operation Research Library were used. The Simulated Annealing was applied to solve the instances. The experiments proved that the approach proposed allows increasing the efficiency of cyclic scheduling significantly.

artificial intelligence, machine learning, scheduling problem, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICIEAM.2016.7911676

2006.10938

Country:

Asia > Russia > Siberian Federal District > Novosibirsk Oblast > Novosibirsk (0.05)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(2 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Add feedback